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Wireless sensor-actor networks are a recent development of wireless networks where both ordinary 
sensor nodes and more sophisticated and powerful nodes, called actors, are present. In this paper 
we formalize a recently introduced algorithm that recovers failed actor communication links via the 
existing sensor infrastructure. We prove via refinement that the recovery is terminating in a finite 
number of steps and is distributed, thus self-performed by the actors. Most importantly, we prove 
that the recovery can be done at different levels, via different types of links, such as direct actor links 
or indirect links between the actors, in the latter case reusing the wireless infrastructure of sensors. 
This leads to identifying coordination classes e.g., for delegating the most security sensitive coor- 
dination to the direct actor-actor coordination links, the least real-time constrained coordination to 
indirect links, and the safety critical coordination to both direct actor links and indirect sensor paths 
between actors. Our formalization is done using the theorem prover in the RODIN platform. 

Keywords: Wireless Sensor Actor Networks (WSANs); Coordination links; Coordination re- 
covery; Refinement; Event-B; RODIN Tool 

1 Introduction 



The separation of computation and control stands at the basis of the software architecture discipline. The 
control of the computing entities as well as the coordination of the controlling entities are well illustrated 
by Wireless Sensor- Actor Networks (WSANs), a rather new generation of sensor networks [ 6 ] . A WS AN 
is made of two kinds of nodes: sensors (the 'computing' entities) and actors (the controlling entities), 
with the density of sensor nodes much bigger than that of actor nodes. The sensors detect the events 
that occur in the field, gather them and transmit the collected data to the actors. The actors react to the 
events in the environment based on the received information. The sensor nodes are low-cost, low-power 
devices equipped with limited communication capabilities, while the actor nodes are usually mobile, 
more sophisticated and powerful devices compared to the sensor nodes. 

A central WSAN requirement is that of node coordination. As there is no centralized control in a 
WSAN, sensors and actors need to coordinate with each other in order to collect information and take 
decisions on the next actions [6|. There are three main types of WSAN coordination [13]: sensor-sensor, 
sensor-actor and actor-actor coordination, the latter being considered in this paper. The actor-actor co- 
ordination is concerned with the actor decisions and the division of tasks among different actors. To 
achieve the actor-actor coordination in WSANs, actors need reliable connection links for communicat- 
ing with each other, which are established upon initializing the WSAN. However, WSANs are dynamic 
networks where the network topology continuously changes. The changes occur when new links or 
nodes are added and when new links or nodes are removed due to failures, typically generated by hard- 
ware crashes, lack of energy, malfunctions, etc. Thus, actor nodes can fail during the operation of the 
network. As a result, a WSAN may transform into several, disconnected WSAN sub-networks. This 
separation is called network partitioning and is illustrated in Fig. [T] where the actor nodes A\ —A\$ are 
shown to produce a network partitioning if actor node A\ fails. 
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Figure 1: Three partitions created by the failed actor A i 



Another central WSAN requirement is that of embedding real-time aspects. Thus, depending on the 
application, it might be essential to respond to sensor inputs within predefined time limits, e.g., in critical 
applications such as forest fire detection. Due to the real-time requirements of WSANs, the failure of 
an actor node should not impact the whole actor network for too long. The problem of actor failing in 
the actor-actor coordination has been already addressed. For instance, the physical movement of actor 
nodes towards each other is proposed in (5j [T) to re-establish their connectivity. However, during this 
movement, nodes in different partitions that have been created by the actor failure cannot coordinate. To 
shorten the time of recovery, Kamali et al ifTTl have previously proposed an algorithm for establishing 
new routes between non-failed actors via sensor nodes. This algorithm allows to quickly reconnect the 
separated partitions, before moving actor nodes as proposed in [5, lj. In this paper, we further study this 
recovery mechanism that alleviates the actor coordination failure. 

There are several properties that are desirable to verify for this algorithm. First, we need to show 
that there is always a path via sensor nodes that can be established by the partitioned actor nodes. Sec- 
ond, it is desirable to guarantee that this path is the shortest, in order not to overload the power-limited 
sensor nodes. Third, to shorten the time of recovery as much as possible, it is desirable to establish the 
connection as soon as possible. In this paper we demonstrate the first property of the algorithm. 

The contribution of this paper is threefold. First, we formalize the algorithm introduced recently for 
self-recovering actor coordination, using a theorem prover tool. This allows us to better understand the 
functioning of the algorithm. Second, we prove the termination of the algorithm, namely we prove that 
the recovery of a link ends in a finite number of steps. This proves that the recovery path can be estab- 
lished between non-failed actors. An important aspect of the recovery method is that the indirect links 
between actors (via sensors or not) are built in a distributed manner, thus ensuring the ^//-recovering 
of the network. Since sensors are so numerous in a WSAN, practically covering the area they sense, we 
thus prove the first desirable property of the algorithm, i.e., that there is always a path via sensor nodes 
that can be established by the partitioned actor nodes. Third, we prove that the recovery can be done at 
different levels, via different types of links, such as direct actor links or indirect links between the actors, 
in the latter case also reusing the WSAN infrastructure of sensors. In fact, the novelty of our contribution 
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lies precisely in identifying these different coordination alternatives for actors, that can be all present in 
a network to improve its functioning. 

The coordination alternatives for actors imply the existence of actor coordination classes, that can be 
assigned various semantics. Assume that a direct actor-actor coordination link is established among two 
non-failed actors via an intermediary, actor failed. This can happen only if the two are in range which, 
given their previous indirect communication via the intermediary, is not always the case. However, 
when this happens, it means that the two actors are rather powerful devices with large enough ranges. 
Semantically, we can define a subset of the actor set in a WSAN which are strategic to the network 
and are so powerful that can communicate directly with each other even if intermediary actors fail. The 
most security-sensitive information and the real-time constrained information can thus be transmitted 
via links among such strategic actors. Indirect links between actors can be used to transmit non-sensitive 
information. The fact that indirect coordination among any two actors can be established through sensors 
provides a fault tolerance support for actor coordination. If the direct actor links are enabled then the 
communication takes place via them, but if not then a 'rescue' route can be established via the sensor 
infrastructure. Thus, the sensor nodes provide the backup infrastructure on which the actor coordination 
can rely. 

In order to prove the local path existence property, we employ the Event-B formal method for con- 
structing a new actor path. Event-B [£3l is an extension of the B formalism [4] for specifying dis- 
tributed and reactive systems. A system model is gradually specified on several levels of abstraction, 
always ensuring that a more concrete model is a correct implementation of an abstract model. The lan- 
guage and proof theory of Event-B are based on logic and set theory. The correctness of the stepwise 
construction of formal models is ensured by discharging a set of proof obligations: if these obligations 
hold, then the development is mathematically shown to be correct. Event-B comes with the associated 
tool RODIN HMT5], which automatically discharges part of the proof obligations and also provides the 
means for the user to discharge interactively the remaining proofs. 

The paper is organized as follows. In Section 2 we briefly overview the Event-B formalism and 
present the recovery algorithm. In Section 3 we model the direct actor links recovery mechanism and 
in Section 4 the indirect actor links recovery. In Section 5 we model the sensor infrastructure and the 
indirect actor links recovery via the sensors. In Section 6 we present the proof statistics of our model and 
in Section 7 we further discuss about the contribution of our paper. In Section 8 we briefly conclude the 
paper. 

2 Preliminaries 

This section briefly overviews our modeling formalism Event-B and also describes the recovery algo- 
rithm to be modeled in this paper. 

Event-B Each Event-B model consists of two components called context and machine. A context 
describes the static part of the model, i.e., it introduces new types and constants. The properties of these 
types and constants are gathered as a list of axioms. A machine represents the dynamic part of the model, 
consisting of variables that define the state of the model and operations called events. The structure of an 
Event-B machine is given in Fig. [2] The system properties that should be preserved during the execution 
are formulated as a list of invariant predicates over the state of the model. 

An event, modeling state changes, is composed of a guard and an action. The guard is the necessary 
condition under which an event might occur; if the guard holds, we call the event enabled. The action 
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MACHINE machine-name 

VARIABLES list of variables 
INVARIANTS list of invariants/predicates 
EVENTS 

INITIALISATION 
BEGIN 

list of actions 
END 

event-name 
WHEN 

list of guards 
THEN 

list of actions 
END 
END 



Figure 2: MACHINE definition in Event-B 

determines the way in which the state variables change when the event occurs. For initializing the system, 
a sequence of actions is defined. When the guards of several events hold at the same time, then only one 
event is non-deterministically chosen for execution. If some events have no variables in common and are 
enabled at the same time, then they can be considered to be executed in parallel since their sequential 
execution in any order gives the same result. 

A model is developed by a number of correctness preserving steps called refinements. One form 
of model refinement can add new data and new behavior events on top of the already existing data and 
behavior but in such a way that the introduced behavior does not contradict or take over the abstract 
machine behavior. In addition to this superposition refinement lfl2l we may also use other refinement 
forms, such as algorithmic refinement [7]. In this case, an event of an abstract machine can be refined by 
several corresponding events in a refined machine. This will model different branches of execution, that 
can for instance take place in parallel and thus can improve the algorithmic efficiency. 

The recovery algorithm In this algorithm, the detection of a failed node leads to the communication 
links among non-failed actor nodes to be reconstructed via sensor nodes. The mechanism has three 
parts: detecting a failed actor, selecting the shortest path, and establishing the selected path through 
sensor nodes. When actor neighbors of an actor node do not receive any acknowledgment from that 
actor node, they detect it as failed. At this time, the neighbors of the failed node have to investigate 
whether this failure has produced separated partitions. If there is no partitioning, then nothing is done 
except updating the neighbor lists in nodes. However, if there are some separated partitions, a new path 
should be selected and established. 

We assume that each actor node has information about its immediate neighbors (1-hop neighbors) 
and 2-hop neighbors (the neighbors of the neighbors). Based on this information, the non-failed actors 
can recover their communication links upon detecting a failed (intermediary) actor. These links are 
formed based on the node degree information (the number of immediate neighbors) and on the relative 
distance between actor nodes. In this paper we focus only on the self-recovery mechanism via sensors, 
due to lack of space. The complete model of the algorithm can be found in [ 10]. 

Our formalization shows that, given networks of sensors and actors, the actors can always reconstruct 
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coordination links between themselves, by using local information and sensors as intermediate nodes. In 
order to prove this property, we model the network at three increasing levels of detail so that each model 
is a refinement of the previous one. In the initial model, we specify the actor network and the recovery 
mechanism of the direct actor links. In the second model, we add new data and events to model the list 
of 1-hop and 2-hop neighbors for every node and model the recovery via indirect actor links. In the third 
model, we distinguish among sensor and actor nodes and their corresponding networks and model the 
recovery via the sensor infrastructure. In the following, we describe these three recovery models. 



3 Recovery via Direct Actor Links 

The context of our initial model contains the definition of constants and sets as well as our model as- 
sumptions as axioms. A finite (axiom 6) and non-empty (axiom 7), generic set NODE describes all the 
network nodes. These nodes can be either sensor nodes or actor nodes, hence the set NODE is par- 
titioned into the sensors and actors sets, where sensors and actors are predefined sets of sensors and 
actors respectively (axiom 8). STATUS denotes the set {ok, fail}, where the constant fail stands for a 
failed node and the constant ok for a non-failed node (axiom 1). We also define the closure constant that 
models the transitive closure of a binary relation on the set NODE (axioms 2-5). We use this constant in 
order to dynamically construct all the possible paths for the current network. 



constants closure ok fail 

sets NODE STATUS sensors actors 

axioms 

@axml partition(STATUS, {ok}, {fail}) 

@axm2 closure e (NODE O NODE) -> (NODE O NODE) 

@axm3 Vr • r C closure(r) 

@axm4 Vr ■ closure(r);r C closure(r) 

@axm5 Vr,i'rCsAs;cCj^ closure(r) C s 

@axm6 finite(NODE) 

@axm7 NODE ^ 

@axm8 partition(N ODE , sensors, actors) 



In the machine part of our initial model we have six events and six invariants as shown below. The status 
of each node (non-failed or failed) is modeled with the function Status mapping each node in NODE to 
ok ox fail (invariant 1). The relation ANET denotes the bidirectional, non-failed actor links (invariant 2 
and 6). This relation is non-reflexive (invariant 4), expressed with the domain restriction operator < and 
symmetric (invariant 5), expressed with the relation inverse operator ~. This means that an ANET link 
from a node to itself is prohibited and if node a has a link with node b, then node b also has a link with 
the node a. The set FailedNodeNeigh denotes non-failed actors (invariant 7). This set is updated when 
a node is detected as failed. We also model that the network is active continuously with theorem THM1 
that ensures that, always, at least one event is enabled (i.e., the disjunction of all the events guards is 
true). We express this constraint with a theorem instead of an invariant for technical reasons detailed 
in OH. 
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INVARIANTS 

@invl Status G NODE ->• STATUS 
@inv2 ANET G actors +-)• actors 
@inv3 FailedNodeNeigh C actors 
@inv4 actors <idn ANET = 
@inv5 AN£T = AN£T ~ 

@inv6 \/n,m-n i-»- m G AAAET => Status(n) = okAStatus(m) = ok 
@inv7 FailedNodeNeigh C (Status ~ [{ofc}]) 

theorem @THMl(3n • Status(n) — fail A n G actors A FailedNodeNeigh 
V (3n,m ■ Status(n) — ok A Status(m) — ok A n h- > m ^ ANET 

A n G acfor A m G actor An/mA FailedNodeNeigh — 



V (3 n • Status(n) — ok A n G Acfora A FailedNodeNeigh ■ 

V (3 n,A: • n G FailedNodeNeigh A A: G FailedNodeNeigh 

Kn^tk^ closure{ANET)) 
V(3 n,k-n G FailedNodeNeigh A A: G FailedNodeNeigh 
An^ke closure(ANET)) 



5) 



The initialisation event sets the status of all the nodes to /a/Z; therefore, the ANET relation should be 
empty based on invariant 6. The set FailedNodeNeigh is set to because it is a sub-set of non-failed 
nodes (invariant 7) and there are no non-failed nodes in the network at initialization. 

Except initialisation, the events in the initial model add actor nodes (AddNode) and actor links 
(AddLink), remove actor nodes and their corresponding links (RemoveNode) and also abstractly recover 
connections when an actor fails (FaultDetRec and FaultDetRec2). 



INITIALISATION 
then 

@actl Status :=NODE x {fail} 

@act2 ANET := 

@act3 FailedNodeNeigh := 



In the AddNode event, every actor that is added overwrites the function Status. 



AddNode 




any n where 




@grdl Status(n) = 


= fail 


@grd2 FailedNodeNeigh = 


@grd3 n G actors 




then 




@actl Status(n) : = 


= ofe 


end 





In the AddLink event we add a link in both directions, to meet invariant 4. 



AddLink 

any n m where 

@grdl Status(n) = okAStatus(m) = ok 

@grd2 n^m£ ANET 

@grd3 n ^ m 

@grd4 n G actors A m G actors 

@grd5 FailedNodeNeigh = 
then 

@actl ANET :=ANET U{n^m,m^ n} 
end 
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The RemoveNode event changes the status of an actor from ok to fail; also, all the links of that actor 
are removed from ANET, expressed with the domain substraction operator < and the range substraction 
operator >. In addition, neighbors of that actor become members of the FailedNodeNeigh set. This 
means that we model the situation where one actor fails at a time, because RemoveNode is not enabled 
again until FailedNodeNeigh becomes empty again (guard 3). Although restrictive, we choose this 
approach in this paper for simplicity. We observe that even if the network is left with one or zero non- 
failed actors, our algorithm does not deadlock because the AddNode event is still enabled. 



RemoveNode 
any n where 

@grdl Status(n) = ok 

@grd2 n <G actors 

@grd3 FailedNodeNeigh = 
then 

@actl St at us (n) :— fail 

@act2 ANET := {n} <iANET O {n} 

@act3 FailedNodeNeigh :=ANET[{n}] 
end 



Removing an actor from the network can lead to separated network partitions. The event FaultDetRec 
detects whether removing an actor has created separated partitions or not. If two neighbors of a failed 
actor had no connection through other actors (i.e., there was no path from one node to the other, ex- 
pressed by guard 2 of FaultDetRec), then a partition is formed. To recover from this partitioning of 
communication, a direct actor-actor link is established (act 1). As FailedNodeNeigh is a subset of a 
finite set (invariant 6, invariant 1, axiom 6), we observe that the FaultDetRec event can be enabled 
only a finite number of times, hence the recovery operation terminates. Technically, this is true be- 
cause card(FailedNodeNeigh) decreases at each execution of FaultDetRec and eventually the guard 
of FailedDetRec will hold no longer. We observe that n ^ k due to guard 2 (the closure construct is 
reflexive). 



FaultDetRec 
any n k where 

@grdl n € FailedNodeNeigh A k e FailedNodeNeigh 

@grd2 n^rk$_ closure(ANET) 
then 

@actl ANET :=ANET U{n^k,k^n} 

@act2 FailedNodeNeigh := FailedN odeN eigh\ {n} 
end 



The FaultDetRec2 event treats the situation when a failure is detected but an alternative path already ex- 
ists between the neighbors of the failed actor (n i— > k G closure{ANET)). In this case, FailedNodeNeigh 
is simply updated. We observe that in the case n = k, the last element of FailedNodeNeigh is removed 
by this event. 



FaultDetRec2 

any n k where 

@grdl n <G FailedNodeNeigh A k e FailedNodeNeigh 

@grd2 m-^ke closure(ANET) 
then 

@act2 FailedNodeNeigh :— FailedN odeN eigh\ {«} 
end 
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Overall, the initial model presented in this section describes the non-deterministic addition and removal 
of actor nodes and actor links in a dynamic (wireless sensor-actor) network for whom the network parti- 
tioning problem can be detected and recovered from via direct actor links. The recovery assumes some 
global network knowledge for the recovery, expressed by closure(ANET). Also, the recovery mecha- 
nism establishes direct links among the non-failed actor neighbors of the failed actor. Both the recovery 
assumption and the recovery mechanism can be used in practice only for strategic actors, i.e., actors 
whom range is sufficiently large to check closure{ANET) and establish direct actor links. The following 
model considers more localized assumptions as well as indirect recovery paths. 



4 Recovery via Indirect Actor Links 

In the previous model we have considered the actor network able to access knowledge about itself while 
in the model in this section we assume that each actor has access only to information of its 1-hop neigh- 
bors and 2-hop neighbors. We now refine the initial model and define a new relation l_net that, for each 
actor, keeps track of the 1-hop and 2-hop neighbors. The relation Ijiet relates three nodes as defined by 
invariant 1 below and is not reflexive, modeled by invariant 2 below. The meaning of this relation is that 
a 1-hop neighbor m of a node n is denoted by n t- > m \- > m G Ijiet and a 2-hop neighbor m of a node n is 
denoted by n \— > m t- > k G Ijiet. In the first example, m is locally related to n via m (itself, i.e., via a direct 
link) and in the second example m is locally related to n via k (i.e., m is a 2-hop neighbor of n, while k 
is a 1-hop neighbor of n). The relation l_net describes all these localized links between nodes. The goal 
of this refinement step is to supplement the global knowledge of the network in the initial model (via 
closure{ANET)) with a localized knowledge formalized with the relation Ijiet . 



@invl Ijiet e actors x actors o NODE 
@inv2 actors < id n dom(l jiet) = 



When a new link is added between two actors the Ijiet relation also needs to be updated. Therefore, the 
AddlLink event is extended to also add links to Ijiet. For every two nodes n and m which have a direct 
link, /i4m4m and m \- > n \- > n are added, meaning that n has a link with m through m (m is a 1-hop 
neighbor n) and m has a link with n through n (n is a 1-hop neighbor of m). 



AddLink 

extends AddLink 
then 

@act2 Ijiet := Ijiet U {n n- m i— > m, m^r n^r n} 
end 



The Addl_net2hopLink event is a newly introduced event that handles the addition of 2-hop neighbor 
links for actors. If an actor has a direct link with two other actors, then these actors will be 2-hop 
neighbors of each other: 
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Addljiet2hopLink 
any n m k where 

@grdl Status{n) = ok/\Status{m) = ok A St at us (k) = ok 

@grd2 m i->- k i->- k e Ijiet An4m4 m € IjietA 
m-> A: h- > m ^ L«e/ A A: i— > n i— > m ^ Lnef 

@grd3 m=^nAn=^kAm^=k 

@grd4 FailedNodeNeigh = 
then 

@actl /_nef := l_netU {n n- fe n- ra,fc n- n n- m} 
end 



When removing an actor node, all its connections should be removed. Thus, in the RemoveNode event a 
new action is added which removes all the immediate links with the failed actor in the Ijiet relation. The 
expression {«} x dom{ANET) x dom{ANET) describes all the links of n, either direct connections (1- 
hop neighbors) or indirect connections (2-hop neighbors) and the expression dom(ANET) x {n} x {n} 
describes all the links between immediate neighbors of n and n. 



RemoveNode 

extends RemoveNode 
then 

@act4 l_net := l_net\ (({n} x dom(ANET) x dom(ANET))U 
(dom{ANET) x {«} x {«})) 
end 



We now need to model the detection of failed nodes and the recovery of links based on local information 
instead of being based on all the network topology as described by closure{ANET). We now use Ijiet 
information in addition to ANET for detecting an actor failure (guard 3) and recovering links in the 
FaultDetRec event. 



FaultDetRec 

extends FaultDetRec 
any n m k where 

@grd3 n i->- k H- m € Ijiet hn^rm^rm^. Ijieth 
fc h- > n^f m € Ijiet Aki-Uni-Hn ^ Ijiet 
then 

@act3 Ijiet :| Ijiet' C (l_net\ {{n^-k^m,k^n i->m}U 

(ANET[{n}} x {m} x {«}) U (AiV£T[{Jk}] x {m} x {/t}))) 
U(AA^£r[{/t}] x {«} x {fe}) U ({n}xANET[{k}} x {£}) 
U(AW£r[{n}] x {k} x {«}) U ({Jt} x ANET[{n}} x {n})U 
({fc} x {«} x (A^OD£ \ {m})) U ({«} x {/t} x (A^OZ)£ \ {m})) 



When actor m is detected as failed, neighbors of m (n and k) that have a connection with each other 
through m (n h-> & i->- m and fc >->■ m-> m) need to find an alternative path toward each other. If there 
is no other route in ANET (n\-> k £ closure{ANET)), then Ijiet should be updated in two phases, by 
removing expired links and adding new routes. Since m is failed, links between n and k through m are 
not anymore valid, so n t- > k \- > m and k t- > n t- > m are removed from l_net. In addition, links describing 
the immediate neighbors of n (ANET [{n}]) and of k (ANET[{k}]) to m via n and k, respectively, are 
removed from Ijiet. The second phase of the updating process is adding new links to connect n and k. 
In this refinement, since we still have no information about sensors, we define that actor n can establish a 
link with actor k through any node except m which is failed: {n} x {k} x (NODE \ {m}) and similarly for 
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actor k to establish a new link with actor n: {k} x {n} x (NODE \ {m}). When node n establishes a link 
with k, neighbors of n also need to add node k to their 2-hop neighbors list (ANET[{n}] x {k} x {«}). 
Moreover, neighbors of k need to add n to their 2-hop neighbors list (ANET[{k}] x {«} x {k}). The 
updating process of Ijnet is described by action 2 in the FaultDetRec event. 

We add a new action to event FaultDetRec2 that updates l_net by removing all the links with the failed 
actor or through it. 



FaultDetRec2 




extends FaultDetRec2 




any n m k where 




@grd3 ny-¥ k^r m G Ijiet A n h^ m i-> m ^ IjietA 




fen- n^f m G Z_nef A fe H- m i-> m ^ Z_ne£ 




then 




@act2 Z_nef := Lnet\ ({« ^fci-}/n,i^ni4 m}U 




(AJV£T[{n}] x {m} x {«}) U (ANET[{k}} x {m} x 


{*})) 



We observe that Ijiet is an elegant data structure relating two actor nodes in its domain via a third 
node in its range. The model described in this section is a formal refinement of the one in the previous 
section. This means that the old invariants still hold for the extended model, in addition to the two new 
ones. Moreover, the link recovery terminates in a finite number of steps. Indirect links between actors 
are now established non-deterministically based on localized information. These types of links can be 
further refined to a more deterministic form, as we show in the following section. 



5 Sensor-Based Recovery 

In the previous model, we have defined Ijiet as a subset of actor relations which after a failure detection 
non-deterministically is upadeted due to lack of knowledge about sensor nodes. In this model, we add 
sensor nodes and specify more concretely how replacement links through sensors are added after detect- 
ing an actor failure. We introduce two new relations on NODE (invariant 1 and invariant 2), SNET and 
SANET, the former representing links among sensor nodes and the latter depicting links between sensor 
and actor nodes. 



@invl SNET G sensors -H- sensors 
@inv2 SANET G NODE «• NODE 
@inv3 SNETDANET = 
@inv4 ANET n SANET = 
@inv5 SNET n SANET = 
@inv6 SNET = SNET ~ 
@inv7 SANET = SANET ~ 
@inv8 sensors < id D SNET = 
@inv9 NODE < id n SANET = 
@invlO Vn, m • n h^ ra G SANET => 

(n G actors A m G sensors) 

V(w G actors A n G sensors) 
@invll Vn,m-n h^ m G SNET =>■ Status(n) = okf\Status(m) = ok 
@invl2 \/n,m-ni-^ m G SANET =>■ Status(n) =okA Status(m) = ok 
@invl3 Vn,k,x,y-n h^ k h^x g Ijiet l\k 4«4)'£ IjietA 

x G sensors Ay G sensor =>■ 

x G SANET[{n}] Aye SANET[{k}] AmjG closure(SNET) 
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These relations describe links between nodes at a different level, hence they are disjoint from the actor 
links modeled by ANET (invariant 3 and invariant 4). SNET and SANET are also disjoint sets (invariant 
5). Moreover, they are symmetric and non-reflexive sets as shown by invariants 6-9. We also formalize 
that for each link n t- > m in SANET one of these nodes should be a sensor node and the other one should 
be an actor node (invariant 10). The next two invariants (invariant 1 1 and 12) model that every node of a 
link in either SNET or SANET should be non-failed. Invariant 13 models that if there is a link between 
two actor nodes via sensor nodes in Ijiet, the involved sensor nodes are within the range of Ijiet, the 
respective actor-sensor links belong to SANET and the sensors themselves have at least one path toward 
each other within closure(SNET). 

In the previous model, removing an actor and all its connections was modeled by the RemoveNode 
event. In this model we refine RemoveNode by adding a new action for updating SANET after removing 
an actor node (action 5). Also, all connections through sensor nodes towards a failed node should be 
removed from Ijiet (action 4). 



RemoveNode 

refines RemoveNode 
then 

@act4 l_net := l_net\(({n} x dom{ANET) x dom(ANET)) U 

(dom{ANET) x {«} x {«}) U (dom(ANET) x {«} x dom(SNET))) 
@act5 SANET := {«} o SANET E> {«} 
end 



In this model we have two new events for adding links between sensor nodes in SNET and links between 
sensor and actor nodes in SANET: AddSLink and AddSALink. 



AddSLink 




any n m where 




@grdl Status(n) = ok A St at us (m) 


= ok 


@grd2 n^m$ SNET 




@grd3 n=^m 




@grd4 n G sensors Am G sensors 




then 




@actl SNET := SNET U{nn m, 


n h- > n} 


end 





AddSALink 

any n m where 

@grdl Status(n) = ok A St at us (m) = ok 
@grd2 (n G actors Am G sensors 
V(n G sensors A m G actors) 
@grd3 n^m£ SANET 



@grd4 n ^ m 
then 

@actl SANET : 
end 



: SANET U{n^m,m^ n} 



The AddSLink event is similar to AddLink with a different guard that models that, for every map n t- > m 
added in AddSLink, n and m should be sensor nodes. The AddSALink event is for adding links between 
sensor and actors. 
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Self-Recovering Sensor-Actor Networks 



The event FaultDetRec which models the recovery mechanism after an actor failure is refined using in- 
formation of SNET and SANET. Compared to the previous version of the event, there are two additional 
parameters x, y as sensor nodes which have connections with actor nodes n and k, respectively. Also, x 
and y have either a direct link or an indirect one towards each other, via closure(SNET). Moreover, the 
actors n and k have no connection with each other (guard 7). 



FaultDetRec 














refines FaultDetRec 












any n m k 


x y where 












@grd4 . 


* € SANET [{n}] Aye SANET[{k}} 






@grd5 . 


x H> y € closure(SNET) 










@grd6 


m £ actors 












@grd7 


ih> k (£ domQjiet \ {n *— > k i— > m}) 






then 














@act3 I 


jiet := {l_net\ ({j 


lh> k 


->• m,^: 


n- « i— > mj 








U(ANET[{n}] x 


{m} > 


c{n})U(AAT£r[{A:}] 


x {m\ 


X {*}))) 




U(ANET[{k}] x 


{«}x 


W)u 


({n}xAA?£T 


[{k}} X 


W) 




U(ANET[{n}] x 


{k}x 


{«})L 


({4xAA^£ , r 


[{«}] X 


{»}) 




U{n <— > k t— > x, k 


t— > n n-y} 








end 















The action 3 in FaultDetRec was non-deterministic in the previous model. We now refine this as- 
signment to a deterministic one. We replace {k} x {n} x NODE \ {m} with k (->• n \-t y and similarly 
{n} x {/:} x NODE \ {m} is replaced with n^-k^-x. 

The action in the FaultDetRec2 event is unchanged. However, we strengthen the guard of the event by 
adding guard 6 that guarantees the existence of a link between two direct neighbors of a failed node via 
other nodes than the failed one. 



FaultDetRec2 

refines FaultDetRec2 
where 

@grd6 n^rke dom(l_net \ {n i-> k h-» m}) 
end 



In this third model we uncover the sensor infrastructure and employ it for the actor recovery. This 
model is a refinement of the previous models, respectiing all the introduced invariants, old and new. The 
recovery is terminating in a finite number of steps as for the previous two models. The third model 
illustrates the usage of sensors as a fault tolerance mechanism for the actor coordination. 



An Additional Model In the model presented in this section, we re-establish connections (through 
sensor nodes) between pairs of actors which were direct neighbors of a failed actor node. However, this 
is not an optimal mechanism since actor nodes can be far from each other. In this case their reestablished 
connections would involve numerous sensor nodes, while there might be a shorter path for this. To de- 
termine the shortest path between these actor nodes we need to introduce information about the physical 
location of the nodes. We describe this refinement step in ifTOl in all the details but skip it here due to 
lack of space. 
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Table 1 : Proof Statistics 




Model 


Number of Proof 


Automatically 


Interactively 




Obligations 


Discharged 


Discharged 


Context 


4 


4(100%) 


0(0%) 


Initial Model 


26 


15(58%) 


11(42%) 


1st Refinement 


19 


13(68%) 


6(32%) 


2nd Refinement 


95 


34(36%) 


61(64%) 


3rd Refinement 


35 


32(91%) 


3(9%) 


Total 


179 


98(54%) 


81(46%) 



6 Proof Statistics 

The proof statistics of our development is shown in Table [T] These figures express the number of proof 
obligations generated by the Rodin Platform as well as the number of obligations automatically dis- 
charged by the platform and those interactively proved. There are significantly more proof obligations 
in the second refinement due to introducing the details of sensor networks and refining the recovery al- 
gorithm to use the sensor nodes. In order to guarantee the correctness of the recovery algorithm, new 
invariants had to be added and proven. Due to the lack of adequate automatic support in the Rodin 
platform for reasoning about set comprehension and unions, we faced with a high number of interactive 
proofs. In addition, the interactive proving often involved manually suggesting values to discharging 
various properties containing logical disjunctions or existential quantifiers. Another proving difficulty 
was due to the fact that the Rodin tool has no capability to create proof scripts and reuse them whenever 
needed (such as implemented in HOL (8[ , Isabelle |9), PVS FI41 . Therfore, in some cases we had to 
manually repeat very similar or almost identical proofs. 



7 Contribution of the paper 

The recovery algorithm that we are modeling in this paper was introduced and simulated in ifTTTl . One 
part of our contribution here consists in proving its successful termination. More importantly, we set up a 
formal model of arbitrary WSANs that can evolve dynamically by adding nodes and their corresponding 
links as well as by removing nodes and their links. Regarding the actor link recovery, we show that it 
is possible in three different forms that successively refine each other. Although we do not present it in 
this paper, these three recovery forms can be completely separated from each other in distinct Event-B 
events. 

Specifically, we formalized the direct actor-actor recovery that relies on the global network infor- 
mation provided by the closure construct. Moreover, we uccessfully proved termination of this formal 
recovery. In the next two refinements, we specified indirect actor-actor recovery via arbitrary nodes or 
via sensors. In these two case, we do not need the global network information to perform the recovery but 
rely instead on the information stored locally i.e., by the Ijiet relation. Since these two forms of indirect 
recovery are correct refinements of the direct actor-actor recovery, we can deduce that the distributed 
recovery is also successfully terminating. Finally the last refinement step (unfortunatelly omited due to 
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lack of space) introduces the physical distance information into our model. 

As a result our formal development acheived a complete formalization of the original algorithm 
presented in ifTTTl . Our model presented in the paper demonstrates the the power and applicabability of 
the formal refinement approach. The original algorithm in [ 1 1 ] consists only of the third recovery form, 
with actors, sensors, the Ijiet relation as well as the physical distance information detailed in [ 10]. While 
this form is quite complex to model and prove to terminate, we have shown how to start from a more 
abstract version and prove the termination for it. The stepwise refinement of this initial model added the 
required complexity while keeping the desired termination property valid. 

The more general message suggested by the results of this paper is that they can apply to any network 
with two categories of nodes, some more powerful than the others and coordinating with all the rest. The 
algorithm we have modeled is essentially a general one that can be reused as the basis for more complex 
networks. We therefore aim towards having a collection of pre-proven templates that can be reused in 
similar situations. This aim is in agreement with creating the collection of parameterized refinement 
patterns for Event-B which is one of the goals of the DEPLOY project. 



8 Conclusions 

In this paper, we have formalized a distributed recovery algorithm in Event-B. The algorithm addresses 
the network partitioning problem in WS ANs generated by actor failures. We have modeled the algorithm 
and the correspondent actor coordination links at three increasing levels of abstraction that refine each 
other. We have proved the refinement formally using the theorem prover tool Rodin [15]. The most 
interesting aspect put forward with our refinement modeling is the development of an actor coordination 
link that can be seen in three forms: a direct actor-actor link, an indirect, not further specified path, or an 
indirect path through sensor nodes. We have developed this link as a refinement with the precise purpose 
of replacing the first form with the second then third one. However, the refinement shows that all the three 
forms can be present in a network and thus provide various coordination alternatives for actors. In this 
respect, one can define coordination classes, e.g., for delegating the most security sensitive coordination 
to the direct actor-actor coordination links, the least real-time constrained coordination to indirect links, 
and the safety critical coordination to both direct actor links and indirect sensor paths between actors. 
This observation can prove very useful in practice. 

Using the sensor infrastructure as temporary backup for actor coordination also aligns with the grow- 
ing sustainability research of using resources without depleting them. Upon detecting a direct actor-actor 
coordination link between two actor nodes, all sensor nodes contributing to a communication link be- 
tween these actor nodes should be released of their backup task, a feature outside the scope of this paper. 

Our formal WSAN model is the first attempt at formalizing WS AN algorithms in Event-B and hence 
the WSAN model can be much extended. For instance, non-deterministically adding and removing 
nodes is a useful feature for these networks as it models their dynamic scalability mechanism as well 
as their uncontrollable failures. However, non-deterministically adding links is just an abstraction for 
nodes detecting each other in wireless range and connecting via various protocols. Hence, the WSAN 
formal modeling space is quite generous and we intend to investigate it further, e.g., by modeling various 
temporal properties as well as real-time aspects and verifying various other algorithms too. 



M. Kamali, L, Laibinis, L. Petre & K. Sere 61 

References 

[1] A. Abbasi, K. Akkaya, M. Younis, A Distributed Connectivity Restoration Algorithm in Wireless Sensor 
and Actor Networks, In the Proceeding of 32nd IEEE Conference on Local Computer Networks (LCN), pp. 
496-503, Dublin, Ireland, March 2007. 

[2] J. R. Abrial, A system development process with Event-B and the Rodin platform, In M. Butler, M. G. Hinchey 
and M. M. Larrondo-Petrie, (eds.), Proceedings of International Conference on Formal Engineering Methods 
(ICFEM 2007). Lecture Notes in Computer Science (LNCS), vol. 4789, pp. 1-3. Springer, 2007. 

[3] J. R. Abrial, Modeling in Event-B: System and Software Design, Cambridge University Press, Cambridge, 

2010. 
[4] J. R. Abrial. The B-Boolc: Assigning Programs to Meanings. Cambridge University Press, 1996. 

[5] K. Akkaya and M. Younis, Coverage and latency aware actor placement mechanisms in WSANs, International 
Journal of Sensor Networks, vol. 3, no. 3, pp. 152 - 164, 2008. 

[6] I. F. Akyildiz, and I. H. Kasimoglu, Wireless Sensor and Actor Networks: Research Challenges, Elsevier Ad 
hoc Network Journal, vol. 2, No. 4, pp. 351-367, 2004. 

[7] R. J. Back and K. Sere. Stepwise Refinement of Action Systems. In J. L. A. van de Snepscheut (ed), Proceed- 
ings of Mathematics of Program Construction (MPC'89), pp. 115-138, 1989. 

[8] HOL proof assistant, http://hol.sourceforge.net/ 

[9] Isabelle proof assistant, http://www.cl.cam.ac.uk/research/hvg/Isabelle/ 

[10] M. Kamali, L. Laibinis, L. Petre, and K. Sere. Reconstructing Coordination Links in Sensor-Actor 
Networks. Technical report in Turku Centre for Computer Science (TUCS) Technical Reports Series, 
www.tucs.fi, Technical Report 967, February 2010. 

[11] M. Kamali, S. Sedighian and M. Sharifi, A Distributed Recovery Mechanism for Actor-Actor Connectivity 
in Wireless Sensor Actor Networks, In the Proceedings of IEEE International Conference on Intelligent 
Sensors, Sensor Networks and Information Processing (ISSNIP), pp. 183-188, Australia, 2008. 

[12] S. Katz. A Superimposition Control Construct for Distributed Systems. In the Proceedings of ACM Transac- 
tions on Programming Languages and Systems, vol. 15, no. 2, pp. 337-356, 1993. 

[13] T. Melodia, D. Pompili, V. C. Gungor and I. F. Akyildiz, Communication and Coordination in Wireless 
Sensor and Actor Networks, In the Proceedings of IEEE Transactions on Mobile Computing, vol. 6 No. 10, 
pp. 1116-1129,2007. 

[14] PVS Specification and Verification System, http://pvs.csl.sri.com/ 

[15] RODIN tool platform, http://www.event-b.org/platform.html. 



